Name: Prosper Loan dataset analysis

##                ListingKey ListingNumber           ListingCreationDate
## 1 1021339766868145413AB3B        193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1       1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A         81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A        658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2        909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D       1074836 2013-12-14 08:26:37.093000000
##   CreditGrade Term LoanStatus          ClosedDate BorrowerAPR BorrowerRate
## 1           C   36  Completed 2009-08-14 00:00:00     0.16516       0.1580
## 2               36    Current                         0.12016       0.0920
## 3          HR   36  Completed 2009-12-17 00:00:00     0.28269       0.2750
## 4               36    Current                         0.12528       0.0974
## 5               36    Current                         0.24614       0.2085
## 6               60    Current                         0.15425       0.1314
##   LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1      0.1380                      NA            NA              NA
## 2      0.0820                 0.07960        0.0249         0.05470
## 3      0.2400                      NA            NA              NA
## 4      0.0874                 0.08490        0.0249         0.06000
## 5      0.1985                 0.18316        0.0925         0.09066
## 6      0.1214                 0.11567        0.0449         0.07077
##   ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1                      NA                                 NA
## 2                       6                     A            7
## 3                      NA                                 NA
## 4                       6                     A            9
## 5                       3                     D            4
## 6                       5                     B           10
##   ListingCategory..numeric. BorrowerState    Occupation EmploymentStatus
## 1                         0            CO         Other    Self-employed
## 2                         2            CO  Professional         Employed
## 3                         0            GA         Other    Not available
## 4                        16            GA Skilled Labor         Employed
## 5                         2            MN     Executive         Employed
## 6                         1            NM  Professional         Employed
##   EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1                        2                True             True
## 2                       44               False            False
## 3                       NA               False             True
## 4                      113                True            False
## 5                       44                True            False
## 6                       82                True            False
##                  GroupKey              DateCreditPulled
## 1                         2007-08-26 18:41:46.780000000
## 2                                   2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4                                   2012-10-22 11:02:32
## 5                                   2013-09-14 18:38:44
## 6                                   2013-12-14 08:26:40
##   CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1                   640                   659     2001-10-11 00:00:00
## 2                   680                   699     1996-03-18 00:00:00
## 3                   480                   499     2002-07-27 00:00:00
## 4                   800                   819     1983-02-28 00:00:00
## 5                   680                   699     2004-02-20 00:00:00
## 6                   740                   759     1973-03-01 00:00:00
##   CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1                  5               4                         12
## 2                 14              14                         29
## 3                 NA              NA                          3
## 4                  5               5                         29
## 5                 19              19                         49
## 6                 21              17                         49
##   OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1                     1                          24                    3
## 2                    13                         389                    3
## 3                     0                           0                    0
## 4                     7                         115                    0
## 5                     6                         220                    1
## 6                    13                        1410                    0
##   TotalInquiries CurrentDelinquencies AmountDelinquent
## 1              3                    2              472
## 2              5                    0                0
## 3              1                    1               NA
## 4              1                    4            10056
## 5              9                    0                0
## 6              2                    0                0
##   DelinquenciesLast7Years PublicRecordsLast10Years
## 1                       4                        0
## 2                       0                        1
## 3                       0                        0
## 4                      14                        0
## 5                       0                        0
## 6                       0                        0
##   PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1                         0                      0                0.00
## 2                         0                   3989                0.21
## 3                        NA                     NA                  NA
## 4                         0                   1444                0.04
## 5                         0                   6193                0.81
## 6                         0                  62999                0.39
##   AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1                    1500          11                               0.81
## 2                   10266          29                               1.00
## 3                      NA          NA                                 NA
## 4                   30754          26                               0.76
## 5                     695          39                               0.95
## 6                   86509          47                               1.00
##   TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 1                       0              0.17 $25,000-49,999
## 2                       2              0.18 $50,000-74,999
## 3                      NA              0.06  Not displayed
## 4                       0              0.15 $25,000-49,999
## 5                       2              0.26      $100,000+
## 6                       0              0.36      $100,000+
##   IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 1             True            3083.333 E33A3400205839220442E84
## 2             True            6125.000 9E3B37071505919926B1D82
## 3             True            2083.333 6954337960046817851BCB2
## 4             True            2875.000 A0393664465886295619C51
## 5             True            9583.333 A180369302188889200689E
## 6             True            8333.333 C3D63702273952547E79520
##   TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1                NA                         NA                    NA
## 2                NA                         NA                    NA
## 3                NA                         NA                    NA
## 4                NA                         NA                    NA
## 5                 1                         11                    11
## 6                NA                         NA                    NA
##   ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
## 4                                  NA                              NA
## 5                                   0                               0
## 6                                  NA                              NA
##   ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1                       NA                          NA
## 2                       NA                          NA
## 3                       NA                          NA
## 4                       NA                          NA
## 5                    11000                      9947.9
## 6                       NA                          NA
##   ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1                          NA                         0
## 2                          NA                         0
## 3                          NA                         0
## 4                          NA                         0
## 5                          NA                         0
## 6                          NA                         0
##   LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1                            NA                         78      19141
## 2                            NA                          0     134815
## 3                            NA                         86       6466
## 4                            NA                         16      77296
## 5                            NA                          6     102670
## 6                            NA                          3     123257
##   LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1               9425 2007-09-12 00:00:00                Q3 2007
## 2              10000 2014-03-03 00:00:00                Q1 2014
## 3               3001 2007-01-17 00:00:00                Q1 2007
## 4              10000 2012-11-01 00:00:00                Q4 2012
## 5              15000 2013-09-20 00:00:00                Q3 2013
## 6              15000 2013-12-24 00:00:00                Q4 2013
##                 MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA             330.43            11396.14
## 2 1D13370546739025387B2F4             318.93                0.00
## 3 5F7033715035555618FA612             123.32             4186.63
## 4 9ADE356069835475068C6D2             321.45             5143.20
## 5 36CE356043264555721F06C             563.97             2819.85
## 6 874A3701157341738DE458F             342.37              679.34
##   LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1                      9425.00            1971.14        -133.18
## 2                         0.00               0.00           0.00
## 3                      3001.00            1185.63         -24.20
## 4                      4091.09            1052.11        -108.01
## 5                      1563.22            1256.63         -60.27
## 6                       351.89             327.45         -25.33
##   LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1                 0                     0                   0
## 2                 0                     0                   0
## 3                 0                     0                   0
## 4                 0                     0                   0
## 5                 0                     0                   0
## 6                 0                     0                   0
##   LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1                               0             1               0
## 2                               0             1               0
## 3                               0             1               0
## 4                               0             1               0
## 5                               0             1               0
## 6                               0             1               0
##   InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1                          0                           0       258
## 2                          0                           0         1
## 3                          0                           0        41
## 4                          0                           0       158
## 5                          0                           0        20
## 6                          0                           0         1

This dataset contains 113,937 loans with 81 variables; EDA will be perofromed for some variables as follows: Univariate Plots, Bivariate Plots, and Multivariate Plots. Prosper is a platform which is a good option for those who can’t get a loan from a traditional bank and don’t want the high interest rates offered by credit cards and payday loans. the process workflow based on the actors is: borrower: submit a loan application. prosper: provide loans after doing checks with some orgnizations to make sure the borrower meets several criteria.

Univariate Plots Section

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

from the histogram above the most loan amount is $4000 then $10000 and $15000. so the amount peaks is between $4000 and $14000

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

from th histogram above the borrower rate is around 20% and the range is between 0.1 and 0.3 (10% to 30%)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    4.00    6.00    5.95    8.00   11.00   29084

from the bar chart Borrowers proposer’s scores are between 1.00 and 11.00 with median 6.00

## 
##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

##                Current              Completed             Chargedoff 
##                  56576                  38074                      0 
## FinalPaymentInProgress              Defaulted              Cancelled 
##                    205                   5018                      5 
##   Past Due (1-15 days)  Past Due (16-30 days)  Past Due (31-60 days) 
##                    806                    265                    363 
##  Past Due (61-90 days) Past Due (91-120 days)   Past Due (>120 days) 
##                    313                    304                     16 
##                   NA's 
##                  11992

from the bar chart, it is clear that most loan data is for current loans then, completed loans and then chargedoff loans. we have some of cases which are paid after the due date.

##      Employed     Full-time     Part-time Self-employed       Retired 
##         67322         26355          1088          6134           795 
##  Not employed Not available         Other            NA          NA's 
##           835          5347          3806             0          2255

from the bar chart above the most of borrowers employment status is employed then full-time

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750003

from the histogram above borrowers monthly income range is between 25,000$ and 75,000$

Univariate Analysis

What is the structure of your dataset?

This dataset contains 113,937 loans with 81 variables. Prosper is a platform which is a good option for those who can’t get a loan from a traditional bank and don’t want the high interest rates offered by credit cards and payday loans. the process workflow based on the actors is: #borrower: submit a loan application. #prosper: provide loans after doing checks with some orgnizations to make sure the borrower meets several criteria.

What is/are the main feature(s) of interest in your dataset?

The main features of interest in the dataset are BorrowerRate and ProsperScore. from the invstigation above there is a relashionship between them as the ProsperScore affect BorrowerRate

What other features in the dataset do you think will help support your
Other features: LoanStatus, EmploymentStatus, StatedMonthlyIncome,

and LoanOriginalAmount might affect BorrowRate.

Did you create any new variables from existing variables in the dataset?

No

Of the features you investigated, were there any unusual distributions?
I reordered LoanStatus and EmploymentStatus as they have unordered data.

Bivariate Plots Section

## 
##  Pearson's product-moment correlation
## 
## data:  loandata$BorrowerRate and loandata$ProsperScore
## t = -248.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6536072 -0.6458311
## sample estimates:
##        cor 
## -0.6497361

from the chart above we observe that when BorrowerRate decreases ProsperScore increases. correlation coefficient is -0.65 BorrowerRate has strong relationship with ProsperScore.

## 
##  Pearson's product-moment correlation
## 
## data:  loandata$BorrowerRate and loandata$StatedMonthlyIncome
## t = -30.155, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.09473938 -0.08321827
## sample estimates:
##        cor 
## -0.0889818

from the chart above we observe tha when BorrowerRate decreases StatedMonthlyIncome increases. correlation coefficient is -0.33, so a BorrowerRate moderate relationship.

from the chart above we observe the mean of BorrowerRate for current and completed are less than late payemnet in gereneral (i.e. pastdue), defaulted and chargedoff

from the chart above we obeserve that the mean of BorrowerRate for not employed borrower is higher than others.

from the chart above we observe that current loan status is the highest StatedMonthlyIncome and late paymnet in gereneral(i.e. pastdue)loan has lower StatedMonthlyIncome.

the chart above presents that prosper score and how much borrower assured monthly payment for all loan status exculds Completed status

the chart above presents that prosper score and how much borrower assured monthly payment for all loan status exculds Current status

the chart above presents that prosper score and how much borrower assured monthly payment for all loan status exculds PastDue status

the chart above presents that prosper score and how much borrower assured monthly payment for all loan status exculds Chargedoff status

the chart above presents that prosper score and how much borrower assured monthly payment for all loan status exculds Defaulted status

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
we observe the following:

the mean of BorrowerRate decreases while ProsperScore increases. correlation coefficient is -0.65 so BorrowerRate has strong relationship with ProsperScore.

the mean of BorrowerRate decreases while LoanOriginalAmount increases. correlation coefficient is -0.33, so BorrowerRate has moderate relationship with LoanOriginalAmount

the mean of BorrowerRate decreases while StatedMonthlyIncome increases. correlation coefficient is -0.088, so BorrowerRate has weak relationship. with StatedMonthlyIncome

the mean of BorrowerRate for current and completed loan status are less than late payment in gereneral (i.e. pastdue), defaulted and chargedoff. the mean of BorrowerRate for not employed borrower is higher than others.

Did you observe any interesting relationships between the other features?

relationships between LoanStatus, and StatedMonthlyIncome so: the current loan status is the highest StatedMonthlyIncome and late payemnet in gereneral (i.e. pastdue) loan has lower StatedMonthlyIncome.

What was the strongest relationship you found?

the strongest relationship is between BorrowerRate and ProsperScore since when the mean of BorrowerRate decreases the ProsperScore increases. as well,between BorrowerRate and LoanOriginalAmount moderate relationship. so, BorrowerRate will be affaected by ProsperScore and LoanOriginalAmount.

Multivariate Plots Section

from the chart above there are a lot of current and completed loans with lower BorrowerRate and higher ProsperScore

from the chart above we confirm that the mean of BorrowerRate decreases while ProsperScore increases for loan status such as current and completed.

from the chart above we confirm that the mean of BorrowerRate decreases while ProsperScore increases for employment status.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
with lower BorrowerRate we will have higher ProsperScore and

this is confirmed with some of LoanStatus such as completed and current.

Were there any interesting or surprising interactions between features?

interesting relationship between BorrowerRate,ProsperScore and LoanStatus were confirmed above.


Final Plots and Summary

Plot One

Description One

The plot above describes BorrowRate for NumberOfLoans from ~ .05 to ~ .35 the peak is at ~ 0.15.

Plot Two

## 
##  Pearson's product-moment correlation
## 
## data:  loandata$BorrowerRate and loandata$ProsperScore
## t = -248.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6536072 -0.6458311
## sample estimates:
##        cor 
## -0.6497361

Description Two

when BorrowerRate decreases ProsperScore increases. correlation coefficient is -0.65 so BorrowerRate has strong relationship with ProsperScore.

Plot Three

Description Three

the mean of BorrowerRate decreases while ProsperScore increases for loan status such as current and completed. ——

Reflection

This dataset contains 113,937 loans with 81 variables; EDA has been perofromed for some variables such as BorrowerRate, ProsperScore, , LoanOriginalAmount, LoanStatus, StatedMonthlyIncome and EmploymentStatus. 1st, Univariate Plots for one variable. then, Bivariate Plots for two variables. Finally, Multivariate Plots for categorical and continuous variables I foucsed on BorrowerRate and its relationship with other variables. we found the following: the mean of BorrowerRate decreases while ProsperScore increases. BorrowerRate has strong relationship with ProsperScore.

the mean of BorrowerRate decreases while LoanOriginalAmount increases. BorrowerRate has moderate relationship with LoanOriginalAmount

the mean of BorrowerRate decreases while StatedMonthlyIncome increases. BorrowerRate has weak relationship with StatedMonthlyIncome

the mean of BorrowerRate for current and completed loan status are less than late payment in gereneral (i.e. pastdue), defaulted and chargedoff. the mean of BorrowerRate for not employed borrower is higher than others.

with lower BorrowerRate we will have higher ProsperScore and this is confirmed with some of LoanStatus such as completed and current.

for future work, I may explore more variables periodically to improve the process workflow